-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Global MMLU Lite #2567
base: main
Are you sure you want to change the base?
Add Global MMLU Lite #2567
Conversation
@shivalika-singh Thanks for the PR and mostly looks good. Just a couple of nits:
I think we can also add a group config. Groups are similar to tags in that they both include multiple tasks, but the former also provides an aggregated metric. |
Hi @baberabb , sure I'll update the readme and look into adding group config and update the PR shortly. Regarding the CLA, I've been trying to sign the CLA since a while. I have agreed to it but somehow it's not getting reflected here. Now when I click the CLA link, it shows me "you have agreed..." and I can't see the button to accept anything anymore (as shown in screenshot). |
@shivalika-singh Hey, so the CLA issue is because you pushed from an account different from this one you made the PR on. see: cla-assistant/cla-assistant#661 (comment) |
Hi @baberabb , I have updated the readmes and signed the CLA. I can look into adding the group config as a follow up PR later this week but would be great if we can merge this for now if it looks good. Thanks! |
Regarding implementing group config, I'm thinking for this dataset it probably makes sense to have these tasks under the "global_mmlu" group:
But my understanding is that to support this, I'll have to update the dataset on hugging face as well. Right now on HF, I have 1 subset per language (ar, hi, bn, etc) Please let me know if my understanding is correct regarding this or if you'd suggest doing it a different way ? I can certainly add these changes in a follow up PR if that sounds good to you. |
Thanks for the updates! you should be able to use process_docs: !function utils.process_docs # <file>.<function_name> and in utils.py (same folder) you can have: import datasets
def process_docs(df: datasets.Dataset) -> datasets.Dataset:
return df.filter(lambda row: row["cultural_sensitivity_label"] == "CS") # according to the subset. can also use df.map() This will apply the filter to all the task datasets when you run the benchmark (e.g. |
Hi @baberabb , updated the readme again. The failing test from previous commit should pass now. Thanks for explaining regarding process_docs. I'll test that and add it as an update in follow up PR shortly. |
Hi @baberabb
Reopening the PR for integrating global mmlu with eval harness. I followed the instructions here and made sure pre-commit checks are passing. Hopefully the tests should pass this time.
This PR integrates the "lite" version of global mmlu which contains 200 CS (culturally sensitive) and 200 CA (culturally agnostic) samples across 15 languages with human translations. We recommend using this dataset for evaluating multilingual models and would like to integrate this with eval-harness.
This is the initial version of the PR based on our discussion here. Let me know if any changes are needed before we can merge this.
cc: @marziehf